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Abstract 

Autism Spectrum Disorders (ASDs) are childhood neurodevelopmental disorders with complex genetic origins. Previous 
studies have investigated the role of de novo Copy Number Variants (CNVs) and microRNAs as important but distinct 
etiological factors in ASD. We developed a novel computational procedure to assess the potential pathogenic role of 
microRNA genes overlapping de novo CNVs in ASD patients. Here we show that for chromosomes #1,2 and 22 the actual 
number of miRNA loci affected by de novo CNVs in patients was found significantly higher than that estimated by Monte 
Carlo simulation of random CNV events. Out of 24 miRNA genes over-represented in CNVs from these three chromosomes 
only hsa-mir-4436b-1 and hsa-mir-4436b-2 have not been detected in CNVs from non-autistic subjects as reported in the 
Database of Genomic Variants. Altogether the results reported in this study represent a first step towards a full 
understanding of how a dysregulated expression of the 24 miRNAs genes affect neurodevelopment in autism. We also 
propose that the procedure used in this study can be effectively applied to CNVs/miRNA genes association data in other 
genomic disorders beyond autism. 
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Introduction 

The Autism Spectrum Disorders (ASDs, MTM: 209850) are a 
heterogeneous group of childhood diseases characterized by 
abnormalities in social behaviour and communication, as well as 
patterns of restricted and repetitive behaviors [1]. The spectrum of 
autism reflects dimensional variability of each core symptom as 
well as the occurrence of co-morbid conditions (intellectual 
disability, epilepsy, dysmorphisms etc.). Like an epidemic, in part 
due to improved diagnostic tools, the global prevalence of autism 
and other pervasive developmental disorders has increased over 
the years reaching the current, impressive, figure of 1/160 
children [2]. Twin studies have demonstrated a much higher 
concordance rates for the disease in monozygotic twins (92%) than 
in dizygotic twins (10%) [3,4], indicating a strong genetic basis for 
autism susceptibility, also supported by the presence of autistic 
features in several monogenic disorders (e.g., Fragile X syndrome, 
Tuberous sclerosis). However, despite these progresses the identity 
of genetic factors still remains unknown in the majority of patients 
and it is likely, that overall, the causes of autism are more complex 
than previously thought involving an interaction of genetic, 
epigenetic and environmental factors all interfering with the 
normal course of neurodevelopment [5-7] . 

In 2007 Copy Number Variants (CNVs) were for the first time 
recognized as important genetic factors in ASD [8]. CNVs are a 
class of inherited or de novo genomic mutations duplicating 
("Gains") or deleting ("Losses") DNA segments > 1 Kb, thus 
altering the normal dosage of the overlapping genes. According to 



their frequency they can be divided into unique, rare (< 1%) or 
recurrent. Since 2007, many studies have investigated CNVs in 
autism, to assess their functional impact, the biological networks 
where the genes in these CNVs are involved and the general 
burden of CNVs in these individuals (e.g. see references [9-1 1]). 
Extended, multiplex families are more likely to carry heritable risk 
factors while, in contrast, sporadic ASD families have demon- 
strated a higher rate of de novo CNVs [8,12]. A striking observation 
was the high prevalence of de novo CNVs in both sporadic and 
familial cases of ASD compared with controls. Rare, de novo or 
inherited CNVs were observed in 5-10% of idiopathic ASD cases. 

Studies performed so far have highlighted the pathogenic role of 
CNVs in terms of dosage change for protein-coding genes [10,13- 
1 5] , without taking into account the potential involvement of non- 
coding RNA, particularly microRNAs genes (see reference [16] for 
a recent exception). 

miRNAs are an important class of post-trascriptional regulators 
each governing the expression of tens or even hundreds of proteins 
in both differentiated cells and during development. miRNA 
transcripts undergo several processing steps occurring first in the 
nucleus, then in the cytoplasm where the mature 22-25 
nucleotide-long miRNA [17,18] enacts mRNA translational 
repression or cleavage by binding to the 3 '-untranslated region 
of their respective target mRNAs [19]. Only a few studies have 
investigated the ASD transcriptome in post-mortem brain samples, 
so far [20—22]. Other studies have used mRNA from lympho- 
blastoid cell lines or peripheral blood cells isolated from patients 
[23-26]. Disruption of miRNA expression has been repeatedly 



PLOS ONE | www.plosone.org 



1 



March 2014 | Volume 9 | Issue 3 | e90947 



MicroRNAs and Copy Number Variants in Autism: A Whole-Genome Analysis 



reported in several microarray studies and believed to be linked to 
pathogenesis in autism [27-31]. However, lymphoblastoid cell 
lines are not the best proxy for neural tissue and only very few 
miRNAs displayed consistent dysregulation among different 
studies and patients. 

In this paper, we aim to expand our knowledge on the 
pathogenic role of miRNA in autism by investigating the 
associations between de novo GNVs and miRNA genes. For this 
reason we developed a novel and powerful computational 
procedure based on Monte Carlo randomization [32] and applied 
it to several published de novo CNV datasets from patients. Our 
positive findings consist in the identification of 24 miRNA genes 
over-represented in de novo CNV from chromosomes nos. 1,2 and 
22 and therefore likely to play a pathogenic role in autism. 

Results 

Study design and preliminary analyses 

The general strategy and steps of our study are outlined in the 
flow-chart of Figure 1. We developed the MAPCNVMIR 
programme to map all human miRNA genes from miRBase to 
178 de novo CNVs from 192 autistic patients ("APL datasets", see 
Table SI). Only 64 out of 1 78 CNVs were found to overlap at least 
one miRNA gene. In addition, 145 miRNA genes were identified 
within distinct or partly overlapping de novo CNVs (Table SI) 
spread over 20 chromosomes. No miRNA genes included in de novo 
CNVs were detected in chromosomes # 11, 13, 14, Y. 

For each chromosome we compared the number and the length 
of all human microRNA genes reported so far in miRBase with the 
length of the chromosome. The fractional length of miRNA genes 
over the size of the chromosome (Ri ratios) are of the order of 
10~ 5 for all chromosomes except for chromosome 19 (>10~ 4 ) 
(see Table 1). This result shows that the fraction of the length of 
the chromosome covered by microRNA genes is very similar for 
all chromosomes (except chr. 19). On the other hand, the 
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Figure 1 . Overview of this study and source data. In this study we 
have used previously published data from 192 autistic patients (the 
"APL datasets" of Table SI) bearing overall 178 de novo CNVs (118 
CNV_Losses and 60 CNV_Gains) with unique start and end positions. 
doi:1 0.1 371 /journal.pone.0090947.g001 



fractional length of CNVs over the size of the chromosome (R 2 
ratios),computed separately for CNV_Gains and CNV_Losses 
(Table 1) using the MAPCNVMIR software developed by us (see 
below), largely differ among the various chromosomes, from a 
value of 0.0006 up to value of 0.2618. This highlights that the 
fractional length of chromosomes covered by CNVs can be very 
dissimilar among chromosomes. Indeed, there are several 
instances (e.g., chr. # 10, 12, 15 and 22 for CNV_Gains and 
chr. # 7, 12, 16, 18, 21 and 22 for CNV_Losses) of CNVs 
affecting large regions of a particular chromosome. Therefore, an 
analysis was performed between the number of miRNA genes 
overlapping de novo CNVs and the R2 ratios in order to find 
possible correlations. Furthermore, the number of miRNA genes is 
also variable. Figures 2a and 2b show that, for many chromo- 
somes, the number of miRNA genes included in CNVs follow a 
linear trend as function of the total length of both CNV_Gains and 
CNV_Losses. In order to quantify this kind of correlation, values 
of correlation coefficients were calculated and were found to be 
larger for CNV_Gains (r = 0.75765) than CNV_Losses 
(r = 0.32732). This finding is consistent with the larger spread of 
CNV_Losses compared to CNV_Gains. On the other hand, some 
chromosomes do not follow the linear trend and are characterized 
by a large number of miRNA genes associated to CNVs (see 
Figure 2 and Table 1). For example,chromosome 22 for 
CNV_Gains and chromosome 2 for CNV_Losses appear as 
"outliers" since they include a much higher number of miRNA 
genes in de novo CNVs than expected. However, the latter analysis 
does not allow us to classify the observed dissimilarities according 
to a statistically significant criterion. 

Monte Carlo randomization 

In order to assess which "outliers" with a high number of 
overlaps are actually significantly different from the other 
chromosomes, we developed the SIMCNVMIR programme 
which implements a numerical analysis procedure for the 
identification of all instances (i.e., chromosomes) where the number 
of miRNA genes overlapping de novo CNVs is significantly higher 
than expected in case of random distribution of (simulated) CNVs. 
The steps of our computational analysis are reported in Figure 3. 
The results of this analysis are reported in Figure 4 and Table 2 
and show that for chromosomes #1,2, and 22 the actual number 
of miRNA loci affected by de novo CNVs in patients is significandy 
higher (FDR-adjusted p-values <0.05) than that estimated by the 
simulated random CNV events. Specifically, CNV_Gains in 
chromosome # 22 and CNV_Losses in chromosomes 1 and 2 
display an over-representation of microRNA genes (see Table 2 
and Figures 4). In Table 2, note that for CNV_Loss in 
chromosome # 2 the number, of "hits" is higher than the 
number of distinct miRNA genes ("Unique"), implying that the 
same miRNA gene is involved. Overall, there are 24 miRNA 
genes overlapping de novo CNVs in the three positive chromosomes 
(see Table 3). Only two, hsa-mir-4436b-l and hsa-mir-4436b-2, 
have not yet been detected in CNVs from the general population 
(i.e., the DGV database). 

Code scripts of the MAPCNVMIR and SIMCNVMIR 
programmes are available to readers from the following URL: 
http:// fisicaechimica.unipa.it/ cnvmirna/ 

Pathway analysis 

The two autism-specific miRNA genes (hsa-mir-4436 b-1 and 
has-mir-4436b-2), are deleted in a patient who bears one CNV of 
332,304 bp. Interestingly, they encode the same 3p and 5p mature 
microRNAs. According to Mirwalk [33], no validated targets are 
yet known for these two miRNAs. We therefore performed a 
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Figure 2. Correlation graph between the no. of miRNA genes in de novo CNVs and the CNV/Chr. lengths ratio (Ratio #2) For each 
chromosome, the number of miRNA genes associated to CNVs is plotted as a function of the fractional length of CNV over the chromosome's size for 
Gains (a) and Losses (b), respectively. The graphs show that whereas the majority of data points lay very close to the best-fit line, indicating that the 
two variables are positively correlated, few chromosomes instead behave as outliers in which certain CNVs appear to affect a no. of miRNA genes 
higher than expected (data used for the graphs were taken from Table 1). 
doi:1 0.1 371 /journal.pone.0090947.g002 



functional enrichment analysis on the predicted target mRNAs to 
highlight the significance of these two microRNAs in ASD 
pathogenesis. The results of this latter analysis, reported in 
Table 4, show that several pathways identified by miRPath have 
been already implicated in autism by previous studies, (in bold in 
Table 4): Lysine degradation [34], Drug metabolism - cytochrome 
P450 [35], Notch signaling pathway [31,36], HIF-1 signaling 
pathway [37], Vasopressin-regulated water reabsorption [38], 
Natural killer cell mediated cytotoxicity [39]. 2NSD1 [40] and 
AMT [41] have been previously identified as autism candidate 
genes. 

Discussion 

In recent years, Copy Number Variants and microRNAs have 
emerged as potentially important etiological factors in ASD. 
However, until recently, in nearly all the studies, these two topics 
have been investigated separately in the context of the research of 
autism. The pathogenic role of CNVs has been interpreted in 
terms of their effect on the function of the overlapping protein- 
coding genes. On the other hand, miRNAs have been studied with 
the aim of uncovering changes in their level of expression in cells 
isolated from patients (see the Introduction for references) vs. 
control cells. In our study the focus is on the potential pathogenic 
role played by miRNA genes/de novo CNVs, instead. For this 
reason, we developed a new computational procedure implement- 
ed in a Fortran-written programme which allows to detect over- 
representation of miRNA genes in de novo CNVs in each 
chromosome. By this computational analysis based on Monte 
Carlo simulations we found that in positive chromosomes (FDR- 
adjusted p values <0.05, see Table 2 for details) there is a 
probability of less than 5% to find, by chance, a number of 
miRNA genes included in CNVs (gain and/ or loss) higher than 
that actually detected in patients. Overall, twenty-four candidate 
susceptible miRNA genes of autism were identified in our study. 

Hereafter, we discuss several potentially critical aspects of this 
new procedure that may have biased our results and interpreta- 
tion. Firstiy, the results do not appear to be biased by the different 



distribution of miRNA genes in chromosomes as all chromosomes 
display very similar miRNA genes length/chromosomes€length 
ratios. The only exception was chr. 19 which has the highest ratio 
(1.47 x 10~ 4 ), but was not scored positive in a simulation analysis. 
Secondly, we considered if different ratios between the length of de 
novo CNVs and the length of the chromosome may have accounted 
for the detection of positive chromosomes. In general, we would 
expect the number of miRNA genes duplicated or deleted by a 
CNV to increase linearly with CNV size. However, such a 
linearity is not always followed, even for chromosomes displaying 
very similar ratios. Typical examples of this latter situation include 
CNV_Gains in chromosome 10 (ratio = 0.0822) vs. CNV_Gains in 
chromosome 22 (ratio = 0.0893) pair consisting of one negative 
(chr. 10) and one positive (chr. 22) chromosome. Thirdly, we could 
not perform a simulation analysis, to be used as a negative control, 
for CNVs detected in individuals from the general population, 
since the data stored in the Database of Genomic Variants 
generally refer to blood donors only and not to their parents, thus 
preventing ascertainment of de novo CNVs. However, it is worth 
mentioning here the results of a recent study by Marcinkowska et 
al [42], which are consistent with our findings. Indeed, these 
Authors found that miRNA loci are under-represented in highly 
polymorphic and well-validated CNVs from the general popula- 
tion [i.e., the Database of Genomic Variants). Fourthly, in Table 1 
the absence of "hits" for both CNV_Gain and CNV_Loss for 
chromosomes # 11, 13, 1 4, Y is simply due to the lack of autistic 
patients bearing de novo CNVs (Loss or Gain) overlapping miRNA 
genes (see Table SI). In turn, it can be speculated that, the lack of 
this type of patient may be ascribed to various factors such as the 
sample size, the use of low-resolution aCGH platforms, the 
occurrence of "protective" miRNA loci for autism in these 
chromosomes. Finally, a more suitable analysis could have 
involved more homogeneous CNV data from subjects (patients 
AND unaffected individuals) of the same ethnicity analyzed with 
aCGH platforms with similar resolution. This was indeed the case 
for the autistic sample (APL) we have used which was homogenous 
in relation to ethnicity in that all patients from the APL dataset 
were "Caucasians" (white north Americans and Europeans). In 
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Figure 3. Schematic representation of the counting process of miRNA genes included in de novo CNVs of autistic patients. The small 
black rectangles close to chromosome are the miRNA genes, whereas the various segments above represent the various CNVs within the 
chromosome. An "hit" is an overlap between a CNV and a miRNA gene. In a) four "hits" are shown, b) Four examples of random distributions of 
simulated CNVs within the chromosome, keeping fixed the lenght of each CNV and changing its start/end positions. Clockwise from top left, the 
numbers of "hits" are 2, 0, 6 and 1, respectively. For each chromosome we carried out 10 6 simulations, c) Finally, the histograms displaying the 
relative frequency of miRNA genes included in randomly located CNVs ("hits") are obtained and the comparison between experimental data and 
computed Monte Carlo distribution is performed. Red lines correspond to the no. of miRNA genes detected in de novo CNVs from patients, p-values 
reported in Table 2 are the areas of the histogram to the right side of the red line. (See also Figure 4). 
doi:1 0.1 371 /journal.pone.0090947.g003 



our study, the use of heterogeneous data concerns instead the 
different aCGH platforms used (APL and DGV datasets) and the 
mixed ethnicity of individuals reported in the Database of 
Genomic Variants. We decided to use such heterogeneous data 
to increase the chance of collecting a higher number of patients 
with de novo CNVs. This decision had its strengths and drawbacks. 
For instance, the use of different aCGH platforms may have 
caused an under-estimation of the number of CNV/ miRNA genes 
associations in positive chromosomes from the APL dataset. In 
contrast, the use of samples with mixed ethnicity from the DGV 
database does not seem to have limited the identification of 
miRNA genes/CNV association in common between DGV and 
APL datasets. In conclusion, though the use of heterogeneous 
CNV data may have limited the identification of additional 



miRNA/ CNV associations, it did not prevent the identification of 
chromosomes with an enrichment of CNVs overlapping miRNA 
genes. 

In our study, the occurrence of the same 22 deleted or 
duplicated miRNA genes detected in both patients and unaffected 
individuals (i.e., DGV) strongly suggest that they are low-penetrant 
risk factors for autism. Difference in penetrance for such 
duplicated/deleted miRNA genes would be explained by a variety 
of factors including: (i) prenatal exposure to enviromental risk 
factors [43], (if) presence/absence of functional SNPs in suscep- 
tibility protein-coding genes of autism [44] , (iii) epistasis [45] , (iv) 
epigenetic factors [6], (v) number and type of protein-coding genes 
co-existing in different CNVs overlapping the same miRNA. It is 
reassuring that other CNV studies have linked several miRNA 
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Figure 4. Histograms displaying the relative frequency of miRNA genes included in randomly located CNVs. For each chromosome, 
the SIMCNVMIR program computes the no. of miRNA genes affected by each randomly distributed CNV realizations and plots the frequency 
distribution corresponding to 10 6 realizations. The analyses were performed separately for CNV_Gains (a) and CNV_Losses (b). 
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Table 2. de novo CNVs from autistic patients with an overrepresented no. of miRNA genes. 
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15 


11 
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0.70911 


16 
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0.70911 


17 
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18 
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0.8421 1 


2 
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19 


N/A 


N/A 


N/A 
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0.64896 


20 


N/A 


N/A 


N/A 
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21 
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0.64896 
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18 


0.04349 
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1 



Unique: the no. of distinct miRNA genes overlapping de novo CNVs in patients. Hits: the no. of identical or distinct miRNA genes overlapping de novo CNVs in patients. 
FDR-adjusted p-value: probability of obtaining a number of miRNA genes overlapping randomly-distributed CNVs larger than hits after correction for multiple testing. In 
bold, chromosomes displaying significant FDR-adjusted p-values (<0.05). N/A: no de novo CNVs are present in the APL dataset. For several chromosomes, the no. of 
unique is smaller than the no. of hits indicating that the same microRNA genes are affected by different de novo CNVs. 
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genes from this group to autism, and include: hsa-mir- 1 306, hsa- 
mir-185, hsa-mir- 1286 and hsa-mir-649 genes [16,46], hsa-mir- 
200a and hsa-mir-429 [16,47], hsa-mir-200b and hsa-mir-149 
[16]. Furthermore, hsa-miR-185 displays an 1.44-fold upregula- 
tion in lymphoblastoid cell lines from autistic patients [48]. 
Interestingly, this latter finding is consistent with the presence of 3 
copies of the hsa-mir- 1 85 gene in the 2 patients from our APL 
dataset. 

To avoid an over-interpretation of our results we have adopted 
a stringent, conservative criterion according to which we propose 
that the 22 miRNA genes shared by unaffected individuals and 
patients should be considered as provisional candidates miRNA 
genes in ASD. On the other hand, hsa-mir-4436b-l and hsa-mir- 
4436b-2, appear at the present time as strong pathogenic 
candidates in ASD. Unforunately, no validated targets have yet 
been identified for these two miRNAs. However, functional 
annotation analysis carried out on predicted mRNA targets for 
these two miRNAs revealed, that 43% (6/14) of the statistically 
significant KEGG pathways obtained with hsa-miR-4436b-3p 
have been already implicated in autism in previous studies 
(referenced in Table 4), a finding which supports a pathogenic 
role for this miRNA. 

During the preparation of our manuscript, Vaishnavi et al 
published an article also addressing the impact of microRNAs 
present in autism-associated Copy Number Variants [16]. Despite, 



several differences (methodological, type of CNV data used) 
distinguishing our study from that of Vaishnavi et al., it is worth 
noting that 8 miRNA genes have been found in common between 
the two studies (chr. 1: hsa-mir-429, hsa-rmr-200a, hsa-mir-200b; 
chr. 2: hsa-mir-149; chr. 22: hsa-mir-185, hsa-mir-1306, hsa-mir- 
1286, hsa-mir-649). 

Conclusions 

Summing up, positive findings of our study include the 
identification of 24 miRNA genes over-represented in de novo 
CNVs from 3 chromosomes. Two miRNA genes from this group, 
hsa-mir-4436b-l and hsa-mir-4436b-2, are likely to play a 
significant pathogenic role in autism since they have not been 
found in CNVs from unaffected individuals. We hope these results 
will lead experimental research towards a better understanding on 
the role played by miRNAs in autism. Finally, we propose that the 
novel procedure used in this study can be effectively applied to 
CNV/miRNA genes association data from other genomic 
disorders beyond autism. 
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Table 4. KEGG pathways enriched for targets of miRNAs hsa-mir-4436b-3p and -5p identified by mirPath 1 . 







KEGG PATHWAY 


p-value 


# genes 


Target Genes 


hsa-mir-4436b-3p 


Lysine degradation 


3.30x1(T 8 


5 


NSD1, KMT2D, PIPOX, KMT2A, 
KMT2E. 




Drug metabolism - cytochrome P450 


6.73 x1(T 6 


1 


CYP3A43. 




Other glycan degradation 


0.000197 


1 


NEU1. 




Glycerophospholipid metabolism 


0.00029 


4 


PNPLA7, PGS1, LPCAT3, 
MBOAT2. 




Notch signaling pathway 


0.000942 


4 


CTBP1, APH1A, NOTCH2, 
MAML1 . 




HIF-1 signaling pathway 


0.006618 


5 


PGF, EGLN3, PLCG1, EN03, 
SLC2A1. 




Vasopressin-regulated water reabsorption 


0.00847 


1 


AQP2. 




Glycine, serine and threonine metabolism 


0.01361 


3 


SHMT2, AMT, PIPOX. 




Renal cell carcinoma 


0.016804 


3 


PGF, EGLN3, SLC2A1. 




One carbon pool by folate 


0.01935 


2 


SHMT2, AMT. 




Pathogenic Escherichia coli infection 


0.04271 


3 


TUBA1A, TUBA8, TUBA1C. 




Graft-versus-host disease 


0.04271 


3 


KIR3DL2, KIR2DL3, KIR3DL1. 




Antigen processing and presentation 


0.044381 


5 


NFYA, NFYB, KIR3DL2, KIR2DL3, 
KIR3DL1. 




Natural killer cell mediated cytotoxicity 


0.048633 


5 


KIR3DL2, KIR2DL3, TNFRSF10B, 
PLCG1, KIR3DL1. 


hsa-mir-4436b-5p 


Prion diseases 


2.16x10~ 73 


1 


PRNP. 




Sulfur relay system 


2.63 x10~ 9 


1 


NFS1. 




Thiamine metabolism 


0.002145 


1 


NFS1. 




Lysine degradation 


0.008932 


1 


WHSC1. 




Transcriptional misregulation in cancer 


0.023567 


2 


NCOR1, WHSC1. 



'in bold pathways or genes previouly implicated in autism, see main text. 
doi:1 0.1 371 /journal.pone.0090947.t004 



Methods 

Data and databases used 

Data on de novo CNVs detected in autistic patients were 
downloaded from three different sources: (i) 71 CNVs from the 
Autism Chromosome Rearrangements Database [49,50], (if) 51 
CNVs from Suppl. Table 8 of [10] and (iii) 75 CNVs from Table 
SI (document S2) of [13]. Throughout our paper the combined 
three above sets of data will be named "APL datasets". CNV 
("APL datasets") data used in this study are reported in the Table 
SI. CNVs and indels detected in individuals of the general 
population were downloaded from the Database of Genomic 
Variants (DGV vers. July 2013) [51]. In our paper this latter 
dataset is named by the acronym "DGV". Names, genomic 
coordinates and chromosomal position for 1,523 human micro- 
RNA genes, were obtained from miRBase (vers. 2012). Readers 
are referred to the article of Griffiths-Jones et al. [52] for an 
explanation of symbols and nomenclature used for miRNAs and 
their genes. Genomic coordinates for start and end of CNVs, 
indels and miRNAs genes were all from Build 37. When necessary, 
conversion of genomic coordinates between different genome 
versions was done using the Liftover tool of the UCSC Genome 
browser [53,54]. Finally, the list of potentially pathogenic miRNAs 
was obtained by excluding miRNA genes over-represented in de 
novo CNVs from patients, but not overlapping CNVs from the 
Database of Genomic Variants (DGV). miRWalk software was 
used to look for experimentally validated mRNA targets for 
miRNAs [33]. miRPath [55,56] was used to identify statistically 
significant KEGG pathways enriched in the list of predicted 



miRNA targets (p<0.05; p-values were corrected to account for 
the False Discovery Rate). 

MAPCNVMIR program (Python) 

Data pre-processing included: (i) computation of the total DNA 
length accounted for by all miRNA genes (L]) and CNVs (L 2 ) in 
each chromosome; (ii) computation of Ri (Ri = Lq /chromosome 
length) and R 2 (R 2 = L 2 / chromosome length); (iii) counts of the 
number of miRNA loci overall included in distinct or overlapping 
CNVs ("hits"). This analysis is performed separately for 
CNV_Gains and CNV_Losses. Thus, for a given chromosome, 
"hits" may consist of distinct and/or identical miRNA genes 
associated to CNVs; on the other hand, we indicate as "unique" 
the distinct miRNA genes overlapping de novo CNVs in patients. 

We developed the MAPCNVMIR programme in Python 
language to achieve a two-fold task: (i) to calculate for each 
chromosome the total length of DNA corresponding to the de novo 
CNV regions and (ii) to map the microRNA genes within the de 
novo CNVs detected in patients using their genomic coordinates 
(Build 37). In order to achieve the first task this programme 
considers the overlapping DNA regions of different CNVs once 
only. The programme first initializes for each chromosome an 
empty array and put the numeric values corresponding to the first 
(C s i) and last (C c i) nucleotide positions ("start" and "end") of the 
first CNV (of the total list of CNVs reported in Table SI) into a 
sub-array. Afterwards, the code considers another CNV and 
compares its initial (C s i) and final (C e 2) nucleotide positions with 
those of the first CNV. Three different cases can occur: 
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1 . the first CNV is totally included in the second one (i.e. 
C S 2<C\ S and C e 2>C e \) and the values inside the sub-array 
are replaced by these new ones; 

2. the second CNV is partially or totally included in the first one 
[i.e. (Q2 < C s \ and C e 2 < C e \) or (C s \ < C s j and C e \ < C e 2) or 
(C s \ < Q2 and C e 2> C f i)] and the sub-array is composed of 
the minimum between C s 2 and C s \ and the maximum between 
C e \ and C e 2; 

3. the second CNV does not overlap the first one and in this case 
a new sub-array with the C s 2 and C e 2 is added to the initial 
array. 

Then another CNV is analyzed and its C s 3 and C e 3 values are 
compared with the values of the first and second CNVs and the 
values of the array are modified according the above-described 
procedure. This procedure is carried out for all CNVs and finally 
an array with the "start" and "end" values of non-overlapping 
DNA regions covered by different CNVs is achieved. The total 
length of CNVs in each chromosome is the sum of the lengths of 
the corresponding DNA regions. Regarding the mapping of the 
microRNA genes within the de novo CNVs detected in patients, for 
each chromosome, the programme first initializes a variable to 
zero and then compares the initial and final nucleotide positions of 
each CNV with the corresponding nucleotide positions of each 
miRNA gene. Let us name C s , C e , M s , M e the numeric values 
corresponding to the first and last nucleotide positions of each 
CNV and microRNA gene (M), respectively, in a particular 
chromosome. If the condition C S <M S and C e > M e is verified 
then the count of the number of miRNA genes is increased by one, 
otherwise the variable remains unchanged. By repeating this 
procedure for each CNV and each miRNA gene, the total number 
of microRNA genes overlapping a CNV is obtained for each 
chromosome. 

Correlation analysis 

On the data obtained by the MAPCNVMIR programme (see 
above) we performed an analysis to evaluate possible correlation 
between the number of miRNA genes overlapping de novo CNVs 
("hits") and the fractional length of CNVs in relation to the size of 
the chromosome (R2 ratios). Briefly, for each chromosome the 
number of "hits" was plotted against the R 2 ratios, thus obtaining 
the linear best fit functions and correlation coefficients were 
calculated. These analyses were performed separately for 
CNV_Gains and CNV_Losses. 

SIMCNVMIR program (FORTRAN) 

The SIMCNVMIR programme was developed to perform 
Monte Carlo randomization analyses, separately for CNV_Gains 
and CNV_Losses in each chromosome. These analyses included: 
(i) simulation of random CNV events, (ii) generation of a frequency 
distribution of "hits" in simulated CNVs, (iii) computation of p- 
values and FDR-adjusted p-values and (iv) selection of chromo- 
somes displaying over-representation of miRNA genes in de novo 
CNVs from patients. 

The analysis aimed at evaluating an over-representation of 
miRNA genes in CNVs was carried out by means of a 
computational simulation procedure implemented by a home- 
made written FORTRAN code. The null hypothesis underlying 
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